ReneWind

Renewable energy sources play an increasingly important role in the global energy mix, as the effort to reduce the environmental impact of energy production increases.

Out of all the renewable energy alternatives, wind energy is one of the most developed technologies worldwide. The U.S Department of Energy has put together a guide to achieving operational efficiency using predictive maintenance practices.

Predictive maintenance uses sensor information and analysis methods to measure and predict degradation and future component capability. The idea behind predictive maintenance is that failure patterns are predictable and if component failure can be predicted accurately and the component is replaced before it fails, the costs of operation and maintenance will be much lower.

The sensors fitted across different machines involved in the process of energy generation collect data related to various environmental factors (temperature, humidity, wind speed, etc.) and additional features related to various parts of the wind turbine (gearbox, tower, blades, break, etc.).

Objective

“ReneWind” is a company working on improving the machinery/processes involved in the production of wind energy using machine learning and has collected data of generator failure of wind turbines using sensors. They have shared a ciphered version of the data, as the data collected through sensors is confidential (the type of data collected varies with companies). Data has 40 predictors, 20000 observations in the training set and 5000 in the test set.

The objective is to build various classification models, tune them, and find the best one that will help identify failures so that the generators could be repaired before failing/breaking to reduce the overall maintenance cost. The nature of predictions made by the classification model will translate as follows:

It is given that the cost of repairing a generator is much less than the cost of replacing it, and the cost of inspection is less than the cost of repair.

“1” in the target variables should be considered as “failure” and “0” represents “No failure”.

Data Description

Importing libraries

Loading Data

Creating a copy of the orginal data frame

Data over view on Train data

Observation

Observation :

Observation :

Observation:

Observation:

Data over view on Test data

Observation:

Observation:

Observation:

Observation:

Observation:

Observation:

EDA

Univariate Analysis

Plotting histograms and boxplots for all the variables

Plotting all the features of the Train data set at one go

Observation:

Observation:

Labelled Bar plot on the Target variable on the Train data

Observation:

Labelled Bar plot on the Target variable on the Test data

Observation:

Bi variate Analysis

Histogram plots to view the distribution of the columns on the Train data

Histogram plots to view the distribution of the columns on the Test data

Observation :

Heat map to check the correlation on the Test Data

Heat map to check the correlation on the Train Data

Observation:

Data Pre-processing

Missing value Imputing

Observation:

Model Building

Model evaluation criterion

The nature of predictions made by the classification model will translate as follows:

Which metric to optimize?

Let's define a function to output different metrics (including recall) on the train and test set and a function to show confusion matrix so that we do not have to use the same code repetitively while evaluating models.

Defining scorer to be used for cross-validation and hyperparameter tuning

Model Building with original data

Sample Decision Tree model building with original data

Observation :

Model Building with Oversampled data

Observation:

Model Building with Undersampled data

Observation :

Insights and Model Selection:

HyperparameterTuning

Sample Parameter Grids

Hyperparameter tuning can take a long time to run, so to avoid that time complexity - you can use the following grids, wherever required.

param_grid = { "n_estimators": np.arange(100,150,25), "learning_rate": [0.2, 0.05, 1], "subsample":[0.5,0.7], "max_features":[0.5,0.7] }

param_grid = { "n_estimators": [100, 150, 200], "learning_rate": [0.2, 0.05], "base_estimator": [DecisionTreeClassifier(max_depth=1, random_state=1), DecisionTreeClassifier(max_depth=2, random_state=1), DecisionTreeClassifier(max_depth=3, random_state=1), ] }

param_grid = { 'max_samples': [0.8,0.9,1], 'max_features': [0.7,0.8,0.9], 'n_estimators' : [30,50,70], }

param_grid = { "n_estimators": [200,250,300], "min_samples_leaf": np.arange(1, 4), "max_features": [np.arange(0.3, 0.6, 0.1),'sqrt'], "max_samples": np.arange(0.4, 0.7, 0.1) }

param_grid = { 'max_depth': np.arange(2,6), 'min_samples_leaf': [1, 4, 7], 'max_leaf_nodes' : [10, 15], 'min_impurity_decrease': [0.0001,0.001] }

param_grid = {'C': np.arange(0.1,1.1,0.1)}

param_grid={ 'n_estimators': [150, 200, 250], 'scale_pos_weight': [5,10], 'learning_rate': [0.1,0.2], 'gamma': [0,3,5], 'subsample': [0.8,0.9] }

Grid Search CV

Performing Grid search on Bagging calssifier , Random Forest classifier and XGboost classifier from the oversampled data

Bagging Classifier Grid search Cross validation

Observation :

Random Forest classifier Grid search Cross validation

Observation :

XG Boost Classifier Grid Search Cross Validation

Observation:

RandomizedSearchCV

Performing Random search on Bagging calssifier , Random Forest classifier and XGboost classifier from the oversampled data

Bagging Classifier Random Seacrh Cross Validation

Observation :

Random Forest Classifier Random Seacrh Cross Validation

Observation :

XG boost Classifier Random Seacrh Cross Validation

Observation :

Model performance comparison and choosing the final model

Train set model performance scores comparison

Validationset model performance scores comparison

Observation :

Model Performance on the unseen Test Data

Choosing Random Forest Random search to test the final model on the unseen Test data

Observation :

Feature Importance as per the Final Model

Observation :

Pipelines to build the final model

Observations:

Business Insights

Recommendations & Conclusions